DETERMINED DNA SEQUENCES DERIVED FROM A PAPILLOMAVIRUS 
GENOME, THEIR USES FOR IN VITRO DIAGNOSTIC PURPOSES AND 
THE PRODUCTION OF ANTIGENIC COMPOSITIONS 
The invention pertains to determined DNA sequences 
derived from a papillomavirus genome, more particularly 
DNA recombinants , including vectors, modified by such DNA 
sequences in such manner that, when said DNA recombinants 
are introduced in suitable host cells in which* said DNA 
recombinants can be replicated, the said DNA sequences can 
be expressed in the form of the corresponding proteins. 
The invention further relates to the proteins themselves, 
which can be purified and used for the production of immu- 
nogenic compositions . 

The invention pertains more particularly to DNA 
products of the papillomavirus designated as IP-2 (now re- 
designated as HPV-33) in the European patent application 
filed under number 85.402362.9 on November 29, 1985, the 
contents of which are incorporated herein by reference. A 
plasmid containing the DNA of said virus has been 
deposited at the CNCM ("Collection nationale de Culture de 
Micro-Organismes" of the Pasteur Institute of Paris) under 
number 1-450. 

Papillomaviruses are members of the papovavirus 
family and possess a genome of about 7,900 base pairs (bp) 
consisting of a covalently closed circular DNA molecule. 
Human papilloma viruses (HPV) are classified on the basis 
of their DNA sequence homology (6) and nearly 40 types 
have now been described. Considerable insight into HPV 
biology and their involvement in human disease has been 
attained by the application of the techniques of molecular 
biology. A possible role for HPVs in human cancer was 
suspected following the detection of HPV DNA in tumors 
resulting from the malignant conversion of genital warts 
(33). The cloning of two HPV genomes, HPV-16 and HPV-18 



(3, 11) from cervical carcinomas has further stimulated 
research in this field of immense socio-economic impor- 
tance. These viruses were discovered in more than 70 \ of 
the malignant genital tumors examined and in many others 
HPV-16 related sequences were detected (3, 16, 33). 
Amongst these is HPV-33 which was recently cloned from an 
invasive cervical carcinoma using HPV-16 as a probe under 
conditions of reduced stringency ( 1 ) . In the presfent study 
we have determined the DNA sequence of HPV-33 and describe 
its relationship to HPV-16. Among the papillomaviruses 
HPV-33 is unique as it possesses a 78 bp tandem repeat 
which strongly resembles the enhancer of SV40 (4, 14). 

The invention stems from the cloning strategy dis- 
closed hereafter of the genome of HPV-33 which enabled 
particular DNA sequences to be identified, more particu- 
larly those providing hybridization probes, particularly 
useful for the detection of DNA of papillomaviruses 
related to HPV-33 in human tissue, whereby positive 
responses can be related to the possible development in 
the host of invasive cervical carcinomas. 

Reference is hereafter made to the drawings in 
which the figs concern respectively : 

FIGS. 1a and 1b . Nucleotide sequence of HPV-33. Position 1 
on the circular genome corresponds to a "Hpa-like" 
sequence found by alignment with HPV-6b. 

FIG. 2. Distribution of the major reading frames in the 
HPV-33 genome. the reading frames were identified by 
comparison with other HPV sequences and the stop codons 
are represdented as vertical bars. Also indicated are the 
locations of unique restriction sites (S, Sma l ; E, Eco RV : 
B2, JBaill; B1, Ball) and the likely polyadenylation 
signals (PA) for the early and late transcripts. In 
addition to these, 6 other potential PA sites (A AT AAA) 
were detected at positions 862, 1215, 1221, 2666, 5837 and 
6239. 
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FIG. 3. Principle features of the non-coding region. A 
section of the non-coding region from positions 7500 to 
114 is shown. The 78 bp tandem repeats are overlined and 
those regions resembling the Z-DNA forming element of the 
SV-40 enhancer are indicated. Potential promoter elements 
are denoted by stars and the 3 copies of the 12 bp 
palindrome enclosed between two rows of dots. 

Preferred sequences are those which encode full 
proteins, more particularly and respectively the nucleo- 
tidic sequences having the open reading frames referred to 
in table I hereafter. 

The conditions under which the DNA sequence 
analysis were performed are defined under the heading 
"MATERIALS AND METHODS" hereafter. The conclusions which 
were drawn from this sequence analysis appear under the 
heading "DISCUSSION". 

MATERIALS AND METHODS 
r>NA sequent analysis. The source of HPV-33 sequenced in 
this study was plasmid p15-5 (1) which consists of a laill 
linearized HPV-33 genome cloned in a pBR322 derivative. A 
library of random DNA fragments (400-800 bp) was prepared 
in M13mp8 (17) after sonication and end-repair of p15-5, 
essentially as described previously (28). DNA sequencing 
was performed by the dideoxy chain termination method (19, 
20) with the modifications of Biggin et al. (2). Most of 
the seQuence was derived in this way although part of the S 
non-coding region was found to be absent or under-repre- 
sented in the M13 library (> 300 clones). The sequence of 
this region was obtained directly from p15-5 using the me- 
thod of Smith (24). Briefly, restriction fragments isola- 
ted from 2 "complemenary" M13 clones were used to prime ✓ 
DNA synthesis on templates prepared from p15-5 which had 
been linearized with a restriction enzyme and then treated 
with exonuclease III (200 units/pmol DNA for 1 h at 22*C). 
rrnnmit-or analvsis . DNA sequences were compiled and 
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analysed with the programs of Staden (26, 27) as modified 
by B. Caudron. Optimal alignments of DNA or protein 
sequences were obtained using the algorithm developed by 
Wilbur and Lipman (31). 

RESULTS AND DISCUSSION 
genomic Arrangement of HPV-33 - The complete 7909 nucleo- 
tide sequence of HPV-33, determined by the M13 shotgun 
cloning/dideoxy sequencing approach, is presented* in Fig. 
1. On average each position was sequenced 6.5 times. In 
agreement with the convention for other papillomavirus 
sequences the numbering begins at a site resembling the 
recognition sequence for Hpa l in the non-coding region. 

An analysis of the distribution of nonsense codons 
(Fig. 2) shows that, as in all other sequenced papilloma- 
viruses, the 8 major open reading frames are locate^ on 
the same strand. Some features common to HPV-33 and HPV 
types 1a, 6b and 16 together with the cottontail rabbit 
papillomavirus and the prototype bovine papillomavirus, 
BPV-1, (5, 7, 8, 13, 21, 22) include the overlap between 
the largest open reading frames in the early region, E1 
and E2, and the inclusion of E4 within the section enco- 
ding E2. Interestingly, the fiaill site used in the mole- 
cular cloning of HPV-33 is situated within the E1/E2 
overlap. Another property common to all papillomaviruses, 
except BPV-1, is the overlap between the L1 and L2 reading 
frames. Following L1 is the 892 bp non-coding region 
which, by analogy with BPV1 (15, 29) undoubtedly contains 
the origin of replication and various transcriptional 
regulatory elements. The principal characteristics of the 
HPV-33 genome are summarized in Table 1. 

Nucleotide Sequence Comparison with HPV-16 - HPV- 16 is the 
only other oncogenic papillomavirus, isolated from tumors 
of the ano-genital region, which has been completely se- 
quenced (22). The gross features of HPV-33 resemble those 
of HPV- 16 except that the E1 reading frame of the latter 
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is interrupted. All of the coding sequences in HPV-33, 
except that of E5 , are slightly shorter than their 
counterparts in HPV-16. This may contribute to the fact 
that its non-coding region, between L1 and E6 (Fig. 2), is 
76 bp longer thereby keeping the genomes nearly constant 
in size. 

When the open reading frames were compared pair- 
wise (Table 2) it was found that E1, E2, E6, E7 , L1 and L2 
displayed between 65-75 \ homology whereas those for E4 
and E5 were more divergent (about 50 % homology) . These 
findings confirm the heteroduplex analysis performed pre- 
viously (1). A comparative study (8) of papillomavirus E1 
gene products showed that the polypetide consists of an 
NH 2 -terminal segment whose sequence is highly variable, 
and a COOH-terminal domain of well-conserved primary 
structure. The longest stretch of perfect sequence homo- 
logy, 33 nucleotides (positions 1275-1307, Fig. 1) is 
found near the 5* -end of the E1 reading frame in a region 
encoding the variable domain of the polypeptide. Several 
other regions of complete identity (19-28 nucleotides) 
were detected elsewhere in E1 , and also in E2, L2 and L1 . 
As many of these sequences are not found in the genomes of 
other HPVs, such as HPV-1a and HPV-6b, this raises the 
possibility that the corresponding oligonucleotides could 
be produced and used as diagnostic hybridization probes 
for screening biopsy material from potentially tumorigenic 
lesions . 

Potential r,*ne Products - The papillomavirus gene products 
may be divided into those which are believed to play a pu- 
rely structural role, L1 and L2, and those required for 
viral propagation and persistence. The results of a compa- 
rison of the probable products of the major reading frames 
from HPVs-33, 16 and 6b are summarized in Table 2. As ex- 
pected there is strong identity between the ocogenic 
HPVs-33 and 16, particularly for the proposed E1, E6, E7, 
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L2 and L1 proteins. When conservative substitutions are 
included the homology between the two L1 polypeptides 
increases to 90 % suggesting that the corresponding 
capsids must be antigenically related. In contrast, 
significantly weaker homologies were detected when the 
analysis was extended to include the benign genital 
wart-forming HPV-6b (Table 2). Comparison of the HPV-16 
proteins with those of HPV-6b revealed slightly more 
homology than was found with HPV-33 suggesting a closer 
evolutionary relationship. 

The non-codina Region - The non-coding region of HPV-33 
displays several unique properties and bears only weak 
resemblance to its homologue in HPV-16. Located between 
the L1 stop codon and including the putative polyadeny- 
lation signal for the late transcripts is a stretch of 223 
bp (positions 7097-7320, Fig. 1) unusually rich in T + G 
(79 %) . Contained within this segment are two copies of a 
19 bp direct repeat (with one mismatch) and 7 copies of 
the motif TTGTRTR (where R is A or G) . The latter is also 
found 7 times in the corresponding region of HPV-16 
suggesting that it may represent a recognition site for 
proteins involved in replication. It should be noted that 
nascent replication forks have been localised in this 
regiion of the BPV-1 genome (29) and that the origin of 
replication of the Epstein-Barr virus consists of a family 
of repeated sequences (32). 

A 12 bp palindrome ( ACCG .... CGGT ) that occurs ex- 
clusively in the non-coding region of all papillomavirus 
genomes examined was recently reported by Dartmann et al. 
(9). Three copies were found in the HPV-33 genome (Fig. 3) 
and these occupy the same positions in the non-coding 
region of HPV-16. A role for the palindrome as a possible 
control site for the early promoter was proposed (4, 9, 
35 15) and indirect support is provided by our finding that 
th non-coding regions of HPVs, such as HPV-33, do not 
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display the clustered arrangement of recognition sites for 
the promoter-specific, activation factor Sp1(12). This is 
in direct contrast to the situation in another papova- 
virus, SV40 (12, 14) . 

The most striking feature of HPV-33 is a perfect 
78 bp tandem repeat located 200 bp after the putative 
origin of replication (Fig. 3). No other repeats of this 
size or sequence have been described in the genomes of 
other papillomaviruses. The presumed early promoter for 
HPV-33 is located about 300 bp downstream from the tandem 
repeat and the characteristic promoter elements (4) could 
be identified (Fig. 3). The size, position and arrangement 
of the 78 bp repeats in the HPV-33 genome suggest that 
they may function as enhancers of viral transcription. 
Tandem repeats of 72, 73 and 68 bp have been located ^near 
the early promoter of SV40 (4, 14), in the LTR of moloney 
murine sarcoma virus (10), and in the BK virus genome (23) 
and shown to enhance transcription from PolII dependent 
promoters in a cis -active manner. From mutagenesis of the 
SV40 enhancer (14, 30) and sequence comparisons of charac- 
terized transcriptional activators a consensus enhancer 
sequence was derived. This structure could not be detected 
in the 78 bp repeat but a potential Z-DNA forming region 
was uncovered. Z-DNA is believed to attract regulatory 
molecules to eukaryotic promoters and a Z-DNA antibody 
binding site has been demonstrated within the SV40 
enhancer (18). The sequence to which this antibody binds 
is also found, albeit with a single mismatch, in the 
putative HPV-33 enhancer (positions 7520-7527, 7599-7606, 
Figs . 1 , 3) . 

The proposed HPV-33 enhancer shows no extended 
sequence homology to the well -characterized enhancers nor 
to other papillomavirus regulatory regions. However, it 
has recently been demonstrated that an enhancer-like 
element is located in the non-coding region of BPV-1 and 
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that it requires the E2 product for activation (25). These 
findings support our proposal that the 78 bp tandem 
repeats could have enhancer function and may indicate that 
the relatively low homology (Table 2) between the E2 pro- 
teins of HPV-33 and 16 reflects a specificity for the 
corresponding enhancer/regulatory regions. 

Tables 1 and 2 which have been referred to in the 
instant disclosure follow. 



TABLE 1. Principal features of the HPV-33 genome 



Open 

Reading START FIRST STOP 

Frame ATG CODON mol.wt. 



E6 76 109 556 TGA 17 632 

E7 543 573 854 TAA 10 825^ 

El 867 879 2811 TGA 72 387 

E2 2728 2749 3808 TAA 40 207 

E4 3326 - 3575 TAG 9 452 

E5 3842 - 4079 TAA 9 385 

L2 4198 4210 5161 TAG 50 539 

LI 5516 5594 7091 TAA 55 839 



a. Calculated from the first ATG where this exists or from the 
start of the open reading frame. 



10 



TABLE 2. Comparison of HPV proteins 3 

HPVs 



Protein 33vl6 33v6b 16v6b 

E6 65(70) 36(51) 37 

E7 61(69) 55(60) 56 

El 61(69) 50(60) 53 

E2 53(65) 46(58) 45 

E4 52(55) 39(46) y 48 

E5 40(52) 39(43) 33 

L2 64(66) 52(58) 53 

LI 81(75) 68(69) 71 



a - Expressed as % homology after alignment with the program of 
(31) .Values in parenthesis represent % nucleotide sequence 
homology. 
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The invention relates more particularly to 
sequences corresponding to the open reading frames of E6, 
E7, E1, E2, E4, E5 , L2, L1 . 
5 The invention pertains also the uses of these 

sequences as hybridization probes, either those which are 
useful also for the detection of other papillomaviruses, 
thus of groups of papillomaviruses - such as probes 
containing part or all of the open reading frames corres- 

10 Ponding to L1 - or those which are more virus - specific, 
i.e. probes containing part or all of the open reading 
frame corresponding to. 

It also relates to other probes which detect 
sub-groups of papillomaviruses, particularly probes for 

15 the detection of viruses which can be related to major 
classes of diseases, i.e. viruses associated with tumors. 
By way of example of one of said probes one should mention 
that which contains the sequence positionned between 
nucleotides 1275 and 1307 according to the numbering of 

20 the nucleotides in figs. 1A, 1B. 

Needless to say that the invention also pertains 
to all of said DNA sequences, when labelled by a suitable 
label, i.e. a radioactive enzymatic or immunof luorescent 
label . 

25 DNAs derived from the viral genome and which carry 

nucleotides modified by a chemical group which can be 
recognized by antibodies also form part of the invention. 
It is well known that such DNAs can be produced by nick- 
translation in the presence of nucleotides modified 
y 30 accordingly. These DNAs form particularly valuables^hybri- 
dization probes which, when hybridized to a DNA prepara- 
tion containing the complementary strand sought, can be 
detected by the above mentioned antibodies . 

The invention also pertains to the diagnostic 
35 methods per se. Suitable methods are examplified 
hereafter . 

Several hybridization methods may be used. For 
example, the spot hybridization method includes, after 
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denaturation of the DNA , the deposition of an aliquot of 
the DNA onto film supports (nitrocellulose or Gene- 
screenplus), the hybridization of each film under the 
5 usual conditions with the probe, and the detection of the 
radioactive hybrid by contact exposition of the hybridized 
film onto radiographic film. Another possibility is 
replicated culture hyridization which involves agarose gel 
electrophoresis separation of the DNA fragments resulting 
10 from treatment of the DNA by restriction enzymes, the 
transfer of the fragments after alkaline denaturation onto 
films (nitrocellulose or Genescreenplus) and their 
hybridization under usual conditions with different 
mixtures of probes. The formation of radioactive hybrids 
15 is detected again by contact exposition of the 
hybridization support films onto radiographic film. » 

For instance the probes of the invention can be 
used for the detection of the relevant viruses (or DNAs 
thereof) in preparation consisting of a biopsy of cells 
20 obtained by scraping a lesion, or of biopsy sections fixed 
with Carnoy's mixture (ethanol, chloroform, acetic acid 
6:3:1) and included in paraffin. 

The above nucleotide sequences can be inserted in 
vectors, to provide modified vectors which, when intro- 
25 duced in the suitable cell host, are capable of providing 
for the transcription and, where appropriate, translation 
of said DNA sequences to produce the corresponding 
proteins which can then be isolated from cellular extracts 
of the hosts. Obviously it is within the knowledge of the 
30 man skilled in the art to select the appropriate vectors, 
particularly in relation to the host to be transformed 
therewith. Vectors consist for instance of plasmids or 
phages which will be selected according to their reco- 
gnized capability of replicating in the corresponding 
35 procaryotic cells (or yeast cells) and of allowing for th 
expression of the DNA sequence which they carry. 

The invention also relates to DNA recombinants 
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containing an insert consisting of a DNA sequence corres- 
ponding to any of the above-defined open reading frames or 
of a part thereof, and suitably engineered to allow for 
the expression of the insert in eucaryotic cells, parti- 
cularly cells of warm-blooded animal. Suitable DNA recom- 
binants are genetic constructs in which said insert has 
been placed under the control of a viral or eucaryotic 
promoter recognized by the polymerases of the* selected 
cells and which further comprise suitable polyadenylation 
sites downstream of said insert. 

By way of example, the invention pertains to DNA 
recombinants containing any of the above-mentioned 
open-reading inserts placed under the control of a 
promoter derived from the genome of the SV40 virus. Such 
DNA recombinants - or vectors - can be used fo£ the 
transformation of higher eucaryotic cells, particularly 
cells of mammals (for instance Vero cells). The invention 
further pertains to portions of the above identified DNA 
sequences which, when inserted in similar vectors, are 
able to code for portions of the corresponding proteins 
which have immunological properties similar to those 
encoded by the full nucleotide sequences mentioned above. 
The similarity of immunological properties can be 
recognized by the capacity of the corresponding polypep- 
tides produced by the relevant host to be recognized by 
antibodies previously formed against the proteins produced 
by the cells previously transformed with vectors contain- 
ing the above mentioned entire DNA sequences. 

It goes without saying that the invention also 
pertains to any nucleotidic sequence related to the pre- 
ceding ones which may be obtained at least in part 
synthetically, and in which the nucleotides may vary 
within the constrainsts of the genetic code, to the extent 
where these variations do not entail a substantial modifi- 
cation of the polypeptidic sequences encoded by the 
so-modified nucleotidic sequences. 
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It already flows from the preceding discussion 
that the invention also pertains to the purified proteins 
or polypeptides themselves as obtainable by the methods 
discussed hereabove . These polypeptides, when produced in 
a suitable host, can either be obtained from the cells, 
for instance after rupturing of their cell walls, or from 
the culture medium of said cells when excreted in said 
cell medium, depending on the cell DNA recombinant system 
which is used. The polypeptide obtained can then be 
purified by resorting to usual purification procedures. It 
should be understood that "purified" in the instant 
context means a level of purity such that, when electro- 
phoresed in SDS-PAGE, the purified proteins yield a single 
detectable band, say by Western blot. 

The viral proteins obtained, more particularly the 
structural proteins, for instance as a result of the 
expression of said DNA sequences in £^ coli. can be used 
for the in vitro detection of antibodies against papillo- 
mavirus likely to be detected in tissue samples of 
patients possibly infected with papillomavirus. 

Of particular relevance are the genetically engi- 
neered proteins having the peptidic sequences which can be 
deduced from the L1 and L2 open reading frames. Another 
peptide of interest is the E6* protein (E6 star), the 
synthesis of which can be induced by splicing and which 
encoded by a nucleotidic sequence located between nucleo- 
tides 229 (donor site) and 404 (acceptor site) of the HPV 
33 sequence (see more particularly Fig. 1A), which sites 
also define the putative splicing sites in the E6 * open 
reading frame of HPV 33. Reference may be had to the 
publication of Schneider-Gardicke and Schwartz, Embo. J., 
5, 2285-2292, as concerns the conditions of the production 
of such proteins. 

These purified polypeptides can in turn be used 
for the production of corresponding antibodies which can 
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be used for diagnosing in vitro the presence of viral 
polypeptides in a biological fluid, particularly in a 
serum or tissue culture of a patient. Like in the prece- 
ding instance, the invention relates to portions of the 
above defined polypeptides, particularly those which are 
recognized by the same antibodies or to the contrary are 
able to elicit in vivo the production of antibodies 
recognizing the complete proteins. , 

It must be understood that the inventions relates^ 
also specifically to the particular peptides encoded by 
the DNA regions specifically referred to in the preceding 
disclosure and which have been found of particular 
interest. 

The invention further concerns host cells trans- 
formed with DNA recombinants containing nucleo^idic 
sequences directing the expression of the different 
peptides mentioned hereabove, and effectively capable to 
produce said peptides when cultured in an appropriate 
culture medium. 

The invention finally also pertains more particu- 
larly to the antibodies themselves which can be obtained 
from an animal, such as rabbit, immunized in standard 
manner with said purified polypeptides and/or from hybri- 
domas previously prepared also in any known manner. Of 
particular inerest are the antibodies (polyclonal and %/ 
monoclonal antibodies) directed against the strutural / 
proteins. These antibodies are useful for the detection of 
viral infection. The antibodies which recognize the L1 , 
L2 and E6 proteins of HPV-33 are of particular 

significance. Antibodies specific of L2 provide diagnostic 
tools for the in vitro detection of specific viruses 
sharing with HPV-33 a sequence encoding a similar L2 
protein. Antibodies specific to L1 are useful for the 
detection of the groups of viruses, to which HPV-33 
belongs. Antibodies specific to the E6* protein are useful 
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for the detection of the oncogenic character of the virus 
causing the abovesaid viral infection. 

The invention also relates to intergenic sequences 
of particular interest, particular the 78 bp sequence. 
This sequence is of particular interest as a possible 
insert in eucaryotic vectors, particularly in a position 
upstream of the promoter and downstream of the site at 
which transcription of the gene or nucleotide sequence the 
transcription of which is sought is initiated in the 
relevant host. 

All documents referred to herein are incorporated 
herein by reference. Particularly these documents can be 
referred to as concerns the definition of expressions used 
in this application where appropriate. As such they form 
part of the present disclosure. 
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